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Abstract — In terabit-density magnetic recording, several bits 
of data can be replaced by the values of their neighbors in the 
storage medium. As a result, errors in the medium are dependent 
on each other and also on the data written. We consider a simple 
one-dimensional combinatorial model of this medium. In our 
model, we assume a setting where binary data is sequentially 
written on the medium and a bit can erroneously change to 
the immediately preceding value. We derive several properties 
of codes that correct this type of errors, focusing on bounds on 
their cardinality. 

We also define a probabilistic finite-state channel model of the 
storage medium, and derive lower and upper estimates of its 
capacity. A lower bound is derived by evaluating the symmetric 
capacity of the channel, i.e., the maximum transmission rate 
under the assumption of the uniform input distribution of the 
channel. An upper bound is found by showing that the original 
channel is a stochastic degradation of another, related channel 
model whose capacity we can compute explicitly. 



I. Introduction 

One of the challenges in achieving ultra-high-density mag- 
netic recording lies in accounting for the effect of the granular- 
ity of the recording medium. Conventional magnetic recording 
media are composed of fundamental magnetizable units, called 
"grains", that do not have a fixed size or shape. Information is 
stored on the medium through a write mechanism that sets the 
magnetic polarities of the grains (8j. There are two types of 
magnetic polarity, and each grain can be magnetized to take 
on exactly one of these two polarities. Thus, each grain can 
store at most one bit of information. Clearly, if the boundaries 
of the grains were known to the write mechanism and the 
readback mechanism, then it would be theoretically possible 
to achieve a storage capacity of one information bit per grain. 

There are two bottlenecks to achieving the one-bit-per- 
grain storage capacity: (i) the existing write (and readback) 
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technologies are not capable of setting (and reading back) the 
magnetic polarities of a region as small as a single grain; and 
(ii) the write and readback mechanisms are typically unaware 
of the shapes and positions of the grains in the medium. In 
current magnetic recording technologies, writing is generally 
done by dividing the magnetic medium into regularly-spaced 
bit cells, and writing one bit of data into each of these bit 
cells. The bit cells are much larger in size compared to the 
grains, so that each bit cell comprises many grains. Writing a 
bit into a bit cell is then a matter of uniformly magnetizing all 
the grains within the cell; the effect of grains straddling the 
boundary between two bit cells can be neglected. 

Recently, Wood et al. (9) proposed a new write mechanism, 
that can magnetize areas commensurate to the size of individ- 
ual grains. With such a write mechanism and a corresponding 
readback mechanism in place, the remaining bottleneck to 
achieving magnetic recording densities as high as 10 Terabits 
per square inch is that the write and readback mechanisms do 
not have precise knowledge of the grain boundaries. 

The authors of [9] went on to consider the information loss 
caused by the lack of knowledge of grain boundaries. A sample 
simulation considered a two-dimensional magnetic medium 
composed of 100 randomly shaped grains, and subdivided into 
a 14 x 14 grid of uniformly-sized bit cells. Bits were written in 
raster-scan fashion onto the grid. At the fcth step of the write 
process, if any grain had more than a 30% (in area) overlap 
with the bit cell to be written at that step, then that grain was 
given the polarity value of the fcth bit. The polarity of a grain 
could switch multiple times before settling on a final value. 
With a readback mechanism that reported the polarity value 
at the centre of each bit cell, their simulation recorded the 
proportion of bits that were reported with the wrong polarity. 
A similar simulation, but with a slightly different assumption 
on the underlying grain distribution, was reported in |6|. 

The authors of J9] also considered a simple channel that 
modeled a one-dimensional granular medium, and computed 
a lower bound on the capacity of the channel. The one- 
dimensional medium was divided into regularly-spaced bit 
cells, and it was assumed that grain boundaries coincided with 
bit cell boundaries, and that the grains had randomly selected 
lengths equal to 1, 2 or 3 bit cells. The polarity of a grain is 
set by the last bit to be written within it. The effect of this is 
that the last bit to be written in the grain overwrites all bits 
previously written within the same grain. 

In this paper, we restrict ourselves to the one-dimensional 
case, and consider a combinatorial error model that corre- 
sponds to the granular medium described above. The medium 
comprises n bit cells, indexed by the integers from 1 to n. The 
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granular structure of the medium is described by an increasing 
sequence of positive integers, 1 = j\ < j2 < ■ ■ ■ < j s < n, 
where ji denotes the index of the bit cell at which the ith grain 
begins. Note that the length of the ith grain is li = ji + i — ji 
(we set j s+ i = n + 1 to be consistent). 

The effect of a given grain pattern on an n-bit block of 
binary data x — (x±, x%, . . . , x n ) to be written onto the 
medium is represented by an operator <\> that acts upon x to 
produce 4>{x) = (y±, j/2, • • • , Vn), which is the binary vector 
that is actually recorded on the medium. For notational ease, 
our model assumes that it is the first bit to be written within 
a grain that sets the polarity of the grain. Thus, for indices 
j within the ith grain, i.e., for ji < j < ji+i, we have 
Vj = x ji- This means that the ith grain introduces an error 
in the recorded data (i.e., a situation where yj ^ xf) precisely 
when Xj ^ Xj i for some j satisfying ji < j < j'j+i- In 
particular, grains of length 1 do not introduce any errors. 

As an example, consider a medium divided into 15 bit cells, 
with a granular structure consisting of grains of lengths 1 and 
2 only, with the length-2 grains beginning at indices 3, 6, 8 
and 13. The grains in the medium would transform the vector 
x = (100001000010000) to (100001100010000) and the vec- 
tor x = (000101011100010) to <f>(x) = (000001111100000). 
Note that cf>(x) ^ x iff a 01 or a 10 falls within some grain. 
In particular, 4>(<fi(x)) = <j>(x) for any x. 

In this paper, we consider only the case of granular media 
composed of grains of length at most 2. Even this simplest pos- 
sible case brings out the complexity of the problem of coding 
to correct errors caused by this combinatorial model. Most of 
the results we present can be extended straightforwardly to the 
case of magnetic media with a more general grain distribution. 

Note that in a medium with grains of length at most 2, it 
is precisely the length-2 grains that can cause bit errors. We 
denote by $„ t the set of operators (j> corresponding to all such 
media with n bit cells and at most t grains of length equal to 2. 
Then, for x G {0, 1}", we let $„, t (x) = {<f>(x) : <f> G $„, t }, 
and call two vectors Xi,x% £ {0, 1}" t-confusable if 

$n,t(asi) n $ nii (x 2 ) ^ 0- 

A binary code C of length n is said to correct t grain errors 
if no two distinct vectors x\, x-i G C are t-confusable. 

In Sections II and III of this paper, we study properties 
of i-grain-correcting codes. We derive several bounds on the 
maximum size of a length-n binary code that corrects t 
grain errors. Our lower bounds are based on either explicit 
constructions or existence arguments, while our upper bounds 
are based on the count of runs of identical symbols in a vector 
or on a clique partition of the "confusability graph" of the 
space {0, 1}™. We also briefly consider list-decodable grain- 
correcting codes, and derive a lower bound on the maximum 
cardinality of such codes by means of a probabilistic argument. 

In Section IV, we consider a scenario in which the locations 
of the grains are available to either the encoder or the decoder 
of the data, and derive estimates of the size of codes in this 
setting. 

In Section V, we consider a probabilistic channel model 
that corresponds to the one-dimensional combinatorial model 
of errors discussed above, calling it the "grains channel". We 



again confine ourselves to length-2 grains. Our objective is to 
estimate the capacity of the channel. For a lower bound on 
the capacity we restrict our attention to uniformly distributed, 
independent input letters which corresponds to the case of 
symmetric information rate (symmetric capacity or SIR) of 
the channel. We are able to find an exact expression for the 
SIR as an infinite series which gives a lower bound on the 
true capacity. To estimate capacity from above, we relate the 
grains channel to an erasure channel in which erasures never 
occur in adjacent symbols, and are otherwise independent. We 
explicitly compute the capacity of this erasure channel, and 
observe that the grains channel is a stochastically degraded 
version of the erasure channel. The capacity of the erasure 
channel is thus an upper bound on the capacity of the grains 
channel. 

We would like to acknowledge a concurrent independent 
paper by Iyengar, Siegel, and Wolf |4| which contains some 
of our results from Section V. The authors of ||4) considered 
a more general channel model that includes our probabilistic 
model of the grains channel as a particular case. Their paper 
contains results that cover our Propositions fTTI and [T8l as well 
as our TheoremQj] However, a major contribution of ours that 
cannot be found in |4) is our Theorem [16] in which we give 
an exact expression for the SIR of the grains channel. 

Throughout the paper, h(x) = —x log 2 x— (1 — x) log 2 (l — 
x) denotes the binary entropy function. 

II. Constructions of grain-correcting codes 

As observed above, when the length of the grains does 
not exceed 2, bit errors are caused only by length-2 grains. 
Furthermore, it can only be the second bit within such a grain 
that can be in error. Thus, any code that can correct t bit-flip 
errors (equivalently, a code with minimum Hamming distance 
at least 2t+l) is a i-grain-correcting code. In particular, t- 
grain-correcting codes whose parameters meet the Gilbert- 
Varshamov bound (see e.g. Q p. 97]) are guaranteed to exist. 
But we can sometimes do better than conventional error- 
correcting codes by taking advantage of the special nature of 
grain errors. 

Observe that the first bit to be written onto the medium can 
never be in error in the grain model. So, we can construct 
t-grain-correcting codes C of length n as follows: take a code 
C of length n — 1 that can correct t bit-flip errors, and set 
C = (0|C) U (1|C). Here, for b G {0,1}, (b\C) refers to 
the set of vectors obtained by prefixing b to each codevector 
of C. For example, when n = 2'™, we can take C to be 
the binary Hamming code of length 2 m — 1, yielding a 1- 
grain-correcting code C of size \C\ = 2 n /n. Note that 2™/n 
exceeds the sphere-packing (Hamming) upper bound, i.e., is 
greater than the cardinality of the optimal binary single-error- 
correcting code of length n = 2 m . 

More generally, again when n is a power of 2, we can take 
C to be a binary BCH code of length n — 1 that corrects t 
bit-flip errors. The above construction then yields a i-grain- 
correcting code C of length n and size \C\ > 2"/n t . 

We next describe a completely different, and remarkably 
simple, construction of a length-n grain-correcting code that 
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corrects any number of grain errors. For even integers n = 2m, 
m > 1, define the code lZ n C {0,1}" as the set 

{(x\X2 ■ ■ ■ X2m) € {0, 1}" : Xi-i — Xi for all even indices i}. 

(1) 

Note that when a codevector from 1Z„ is written onto a 
medium composed of grains of length at most 2, the bits at 
even coordinates remain unchanged. Indeed, a bit at an even 
index i could be in error only if a grain starts at index i — 1, 
causing the bit at index i — 1 to overwrite the bit at index 
i. However, the two bits are identical by construction. Thus, 
lZ n is a code of size 2 n / 2 that corrects an arbitrary number of 
grain errors. This construction can be extended to odd lengths 
n = 2m + 1, m > 1, as follows: lZ n = (0|72.2m) U (\-\R-2m)- 

III. Bounds on the size of grain-correcting codes 

Let M(n, t) denote the maximum size of a length-n binary 
code that is t-grain-correcting. The constructions of the pre- 
vious section show that M(n, t) > 2 T'"-/ 2 1 f or any n and t, 
and M(n,t) > 2 n /n t when n is a power of 2. In an attempt 
to determine the tightness of these lower bounds, we derive 
below some upper bounds on M(n,t). 

A. Upper Bounds Based on Counts of Rims 

Denote by r(x) the number of runs (maximal subvectors 
of consecutive identical symbols) in the vector x € {0, 1}". 
As remarked in Section I, a single grain can change x to a 
different vector if and only if the grain straddles the boundary 
between two successive runs in x. Thus, |$ raj i(a;)| = 1 + 
(r(x) — 1) = r(x). For t > 2, the number |<S> ni t(aj)| is not 
readily expressible in a closed form. Nevertheless, we have 
the following lemma. 

Lemma 1 

t i-i 

I^NI^i + E-ylT^)- 1 - 3 ^- 

i=l " j=0 

Proof: The right-hand side is a worst-case count of the number 
of ways in which i < t length-2 grains can be placed so that 
each grain straddles the boundary between successive runs in 
x. The first grain can be placed in r(x) — 1 ways; after that, 
in the worst case (which happens when the first grain falls in 
the middle of a 1010 or 0101), the next grain can be placed 
in (r(x) — 1) — 3 ways; and so on. I 

This leads to the following upper bound on M(n,t). 

Theorem 2 For any fixed value oft, 
on 

M(n,t) < — (f!2*+2 + o(l)), 
n 

where o(l) denotes a term that goes to as n — > oo. 

Proof : Let C be a t-grain-correcting code of length n, and let 



Ci = {x G C : \r{x) - n/2\ < ^Jnt log 2 n}. 
For any x € C\, we have from Lemma [TJ 

|*„,t(s)| > ±(r(x)-l-3(t-l)Y 



> -(n/2 - y/ntlog 2 n- 1 - 3(t- 1))* (2) 



Since C\ itself is i-grain-correcting, we also have 
2"> I |J #„,t(aO| = \®n,t(x)\. 



(3) 



x6Ci 



It follows from © and © that 

|Ci|<— ^(1 + 0(1)). 

Now, let C2 = C\C\. We shall bound from above the size of C2 
by the number of vectors x £ {0, 1}™ such that \r(x)—n/2\ > 
^rrf log 2 n. Define V : {0, 1}" -> {0, l}"" 1 by setting 

ip((xi,x 2 , ■ ■ -,x n )) = (xi © X 2 ,X 2 © X3, . . . ,X„_i © x n ) 

where © denotes modulo-2 addition. Then, r(x) = 
Wii{ip(x)) + 1, where Wh{-) denotes Hamming weight. For 
any given vector ye{0, l}"" 1 , there are exactly two vectors 
Xi, x 2 = 1 © X\ such that ip(xi) = ip(x2) = y- Therefore, 



\C 2 \ < 2|{yeF™- 1 :\w H (y) + l 

ti / 2 — -J nt log 2 n 

^ 4 E 



n/2\ > v /?7ilog 2 n}\ 



2=0 



1 



< 



where = — zlog 2 z — (1 — z) log 2 (l — z) is the binary 
entropy function. Since h(h — x) < 1 — j^x 2 , 

< 4 exp I ( 



\C2 



(n-l - 



2 (2^?itlog 2 n - l) 5 



In 2 



4(n - 1) 



< 2 n+1 n"*. 



We conclude by noting that \C\ = \Ci\ + |C 2 |. I 

For fixed t, the upper bound of the above theorem is within 
a constant multiple of the lower bound M(n,t) > 2 n /n t , 
stated earlier as being valid when n is a power of 2. 

The bound of Theorem|2]is not useful when t grows linearly 
with n, say, t = nr for t € (0, V2]. In this case, we define 

log 2 M(n, [nr\ ) 



R(t) = lim sup ■ 



(4) 



An upper bound on R(t) for small r can be established by 
an argument similar to the proof of the previous theorem. 



Proposition 3 Let x* = x* (r) be the smallest positive solu- 
tion of the following equation: 



, ,'1 — x\ \ - x / 4r 
h — — I + — — /i( 



1. 



2 7 4 Vl-x, 
Forr < 0.0706, the following bound holds true: 

R(r) <h/- 



(5) 



Proof : The proof relies on a coarser estimate of |$„. t (a;)| 
than the one in LemmaQ] Consider the boundaries between the 
(2« — l)-th and 2z-th runs in x, i = 1, 2, . . . , Lr(cc) /2J . Length- 
2 grains can be independently placed across these boundaries, 
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leading to the lower bound 



l*», t (a:)| > £ 



[r(x)/2\ 



(6) 



For t = [rn\ , let C be a i-grain-correcting code. For some 
S > 0, let 

C 1 = {xeC:r(x)/2>[^(l-5)\) 
The bound (O implies that for each x E Ci 



I*«,t(a0l>£ 

From the above and (O, we obtain 

2" 



Lf (i - 5)j 



< 



LS(i-*).h 



The size of the remaining subset of vectors C 2 = C\C\ does 
not exceed the number of all vectors x with r(x) < §(1 — S), 
i.e., 

Lf (1-<5)J 



< 



£ 

i=0 



n - 1 
z 



< 2 



Therefore, 



2" 



C < mm < — 



When t < or equivalently, 6 < 1 — 8r, the dominant 



term in the sum in the denominator above is 



'Lf (i-<s)J> 



which 



is bounded below by 



4-2- 

/8n 



|rnj 

From this, we obtain 



{'-¥*(£?)•*(¥)} 



i?(r) < min max <| 1 
v ' ~ 0<<5<l-8r I 4 Vl - 5 J V 2 

(7) 

Now, for 1 — 8r to be positive, we need r < 1/8. For any 
fixed t G [0, 1/8), and <5 G [0, l-8r], the function f(5) = 1- 
¥^(1^7) * s an mcreasm g function of 5, while the function 
g(S) = /)(^¥) i s a decreasing function of <5. At 5 = 0, we 
have g{5) > f (S). If, at 8 = 1— 8r, we have g(<5) < /(<*>), then 
it follows that the minimum over S in © is achieved when 
f(5) — g(S). In other words, the minimizing value of S in this 
case is precisely the x* in the statement of the proposition. It 
is readily verified that at 8 = 1 — 8r, we have g(8) — f(5) = 
h{Ar) + 2r — 1, which is negative when r < 0.0706. I 

Bound (0 is plotted in Fig. [T] along with the asymptotic 
version of the Gilbert- Varshamov lower bound, which, as 
observed in Section|II] is also valid for grain-correcting codes. 
The methods of the next subsection yield upper bounds on 
R(t) for any r < but these are harder to evaluate than 
the bound of Proposition [3] 

B. Upper Bounds Based on Clique Partitions 

A clique partition of a graph G is a partition (Vi, ...,14) 
of its vertex set V such that the subgraph induced by each Vj, 
j = 1, . . . , k, is a clique of G. Let x(G) denote the smallest 
size (number of parts) of any clique partition of G. 
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Fig. 1. Upper and lower bounds on the asymptotic coding rate of grain- 
correcting codes. 



Let G(n, t) be a confusability graph of the code space, 
defined as follows: the vertex set of G(n,t) is {0,1}™, and 
two distinct vertices x,x' are joined by an edge iff they are 
t-confusable. For notational simplicity, we denote x(G(n, t)) 
by Xn,t- We do not assume that t is an integer; for non-integer 
values of t, we set Xn,t = X n , L*J ■ 

To state our next result, we need to extend the definition 
of M(n,t) as follows: M(0,t) = 1 for all t. 

Proposition 4 For m < n and s < t, 

M(n,t) < Xm, s M(n — m,t-s). 

Proof : Let C C {0,1}™ be a i-grain-correcting code of size 
\C\ = M(n,t), and let (Vi,...,Vfe) be a clique partition 
of G(m,s) of size k — Xm.s- For j = l,...,k, define 
Cj = {(ci, . . . , Cn) £ C : (ci, . . . , Cm) G Vj}. As the V/s form 
a partition of {0, 1}™, the C/s form a partition of C. Therefore, 
it is enough to show that \Cj | < M(n — m,t—s) for all j. Let 



m+1 3 



• 3 C*n ) • 3 (ci j . . . j C m , C m -\- 1 , . . . , C n ) G Cj } . 



C 

The canonical projection map 7r : Cj — > Cj is a bijection; to 
see this, it is enough to show that it is injective. If ir(c) = tt(c) 
for c, c G Cj, then c = (ci, . . . , c m , Cm+i, ■ ■ ■ , c„) and 



(ci, 



for some (ci, 



and 



(ci,...,c m ) in V^. But, since the subgraph induced by Vj 
forms a clique in G(m,s), we have that (ci,...,c m ) and 
(61, . . . ,Cfn) are ,s-confusable. Thus, we see that c, c are s- 
confusable (and hence t-confusable since s < t) unless c = c. 
Hence, it is a bijection, so that \Cj\ = \C'j\. 

We further claim that C' 3 C {0, 1}«-" 1 is a (i - s)-grain- 
correcting code, which would show that \Cj \ = \C'j\ < M(n — 
m,t— s). Indeed, consider any pair of distinct words c', d! G 
Cj. There exist distinct codewords (a',c') and (b',d') in Cj. 
By definition of Cj, a' and b' are s-confusable. So, if d and 
d! were (i — s)-confusable, then (a', c') and (6', d ) would be 
t-confusable, which cannot happen for distinct codewords in 
Cj. Hence, Cj is a (t — s) -grain-correcting code. I 

If n/m > t/s (or equivalently, t/n < s/rri), then repeated 
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application of the above proposition yields 

M(n,t) < (xm,s) lt/sl M(n- m[t/a\,t - a[t/a\), 
from which we obtain the following corollary. 

Corollary 5 Ift/n < s/m, then 

M(n,t) < (x m , s ) [t/si 2"- m ^J. 

It is difficult to determine Xm,s exactly for arbitrary m,s. 
Upper bounds on Xm,s can be found by explicit constructions 
of clique partitions of G(m, s). Observe that for any y £ 
{0,1}"\ the set $ m y y ) := {x £ {0, l} m : y G $ m ,s(x)} 
forms a clique in G m>s . Thus, clique partitions of size k can 
be found by identifying sequences y x , . . . , y k £ {0, l} r ™ such 
that the sets j = 1, . . . , k, cover {0, l} m . Note that 

the sets Vj = ^] s (y j ) \ (|Ji<j V^j, j = 1, . . . ,k, then form 
a clique partition of G(m,s). We implemented the greedy 
algorithm described below to find such a list of sequences 
y 1 , . . . , y k , and hence, a clique partition Vi, . . . , Vk- 

Algorithm 1 A greedy algorithm for finding clique partitions 
in G(m, s). 

1: determine the sets $~^ s (y) for all y £ {0, 1}™; 
2: set B(y) = $~] s (y) for all y £ {0, l} m , 
set k = 0; 

3: while there exists a y such that B(y) is non-empty do 

4: k <- k + 1 ; 

5: find a y k such that |B(y fc )| = max ye{0 ^ }m \B{y)\; 



Setting s = rm in Corollary [6] we obtain R(t) < 
Xm,T?ni an d hence, 



set V k = B(y k ); 

for each y £ {0, l} m 

<- B(y) \ V k ; 

return Vi,...,Vfc. 



Table Q] lists upper bounds on Xm,s obtained via our im- 
plementation of the greedy algorithm. The underlined entries 
in the table are known to be exact values of Xm.s, obtained 
either from the fact that Xm.s > M(m, s) > 2^ m / 2 \ or from 
specialized arguments that we omit here. 

From Corollary [5] and Table U we can obtain a suite of 
upper bounds on M(n, t) valid for various ranges of n and i; 
for example, the entry for (m, s) = (10, 1) in the table yields 
that M(n,t) < 23_6 t 2 n ~ wt for t/n < 1/10. The following 
upper bound on R(t), which was defined in is also a 
direct consequence of Corollary [3] 

Corollary 6 Form, s such thatr < s/m, 

— /m 1 

R(t) < 1 - r log 2 X m,s 

\ s s 

When used in conjunction with Table |U the above corollary 
gives useful upper bounds on R(t). For instance, using the 
table entry for (to, s) = (16,4), we find that R(t) < 1 — 
r(4 - \ log 2 662) w 1 - 1.657r for r < 1/4. Figure [T] plots 
the minimum of all the upper bounds on R{t) obtainable from 
Corollary [6] and the entries of Table Q] 



R(t) < inf — log 2 Xm,i 

mm 



lim — log 2 Xm.rm- (8) 
m— >oo TO 



The last equality above follows from Fekete's lemma (see e.g. 
|2] p. 85]), noting that f(m) = log 2 Xm,rm is a subadditive 
function, i.e., /(to + n) < f(m) + f(ri). The bound in (O is 
presently only of theoretical interest, as the infimum (or limit) 
on the right-hand side is difficult to evaluate in general. 



C. A List-Decoding Lower Bound 

We briefly venture into the territory of list-decoding in this 
section, and give a lower bound on the achievable coding rate 
of a list-L-decodable code. Recall that in the list-decoding 
setting, the decoder is allowed to produce a list of up to L 
codewords. Formally, a code C is list-L t-grain-correcting if 
for any vector x e {0, 1}™, |{c G C : x G $„, t (c)}| < L. In 
words, for any received vector x G {0, 1}", there are at most 
L codewords that could get transformed to x by the action of 
an operator <fr G $„,f. 

We will find the following definition useful in what is to 
follow. For <j) £ let be the vector (ei,...,e„) G 

{0, 1}™, with Cj = 1 iff (j> has a length-2 grain beginning at 
the (j - l)th bit cell. Define E n _ t = {e<f, : 4> £ Note 
that £ n> t consists of all binary "error vectors" of length n 
and Hamming weight at most t such that the first coordinate 
is always and no two l's are adjacent. An easy counting 
argument shows that 



if,, 



i=0 



(9) 



Denote by M (n, t; L) the maximum size of a list-L t-grain- 
correcting code of length n, and define for < r < 1 /2, 

^(r;L)=Uminf l0S2M(n ' LnTj;jL) . 

n.— >oo Ti 

Proposition 7 We have 

2 nL/(L + l) 

M(n,t;X)> ^ t , 



E 



i=0 



and hence, 



R(r;L)> ^-(l-r^ 



forr < \ - & w 0.2764. 

Proof : For a vector a; G {0,1}™ let us define 

^) = { Z e{0,l}":xG$„, ( (4 
Note that B(a;) C {a; e : e G £ n ,t}> so that |S(x)| < 

\£n,t\ = J27=l (™, *)' 

Let us construct the code by choosing M codewords ran- 
domly and uniformly with replacement from {0,1}". For a 
fixed vector y £ {0, 1}™, call the choice of any L + 1 code- 
words Ci , . . . , C£_|_i 'bad' if C\,...,Cl + i £ B(y). Clearly, 
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m 




2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 




1 


2 


4 


6 


10 


18 


36 


66 


122 


236 


428 


834 


1574 


3008 


5716 


11014 


s 


2 






4 


8 


12 


18 


30 


54 


92 


162 


284 


530 


948 


1730 


3210 




3 










8 


16 


24 


34 


56 


88 


138 


238 


418 


716 


1266 




4 














16 


32 


44 


64 


98 


156 


248 


392 


662 



TABLE I 

Upper bounds on \m,s obtained by computer search; the underlined table entries are known to be exact values of %■ 



the expected number of bad choices for a random code C is 
less than or equal to 



M 
L + l 



(n—i\ \ L+l 



< 



i=0 



n — i 
i 



L+l 



-nL 



Take M = 2" L /( L+1 VE*=o (V) > then the ensemble- 
average number of bad (L + l)-tuples is less than 1. Therefore 
there exists a code of size M in which all the (L + 1)- 
tuples of codewords are good. This implies the lower bound 

on M(n,t; L). 

The bound on R(t; L) follows from the observation that 
( n ~ l ) increases with i for i < (5n + 3 — \Jhn 2 + lOn + 9) . 
Thus, as long as t/?i < ^ — -j§, the asymptotics of the 
summation Ei=o ("7*) ^ s determined by the term 



(n—t\ 
t 



We do not at present have a useful upper bound on 

M(n,t\L). 

IV. Grain pattern known to encoder/decoder 

In this section, we assume that the user of the recording 
system is capable of testing the medium and acquiring infor- 
mation about the structure of its grains. This information is 
used for the writing of the data on the medium or performing 
the decoding. Specifically, we assume again a medium with 
n bit cells and at most t grains of length 2, but now the 
locations of the grains are available either to the decoder but 
not the encoder of the data (Scenario I) or, conversely, to the 
encoder but not the decoder (Scenario II). Accordingly, let 
Mi(n, t),i = 1,2, be the maximum number of messages that 
can be encoded and decoded without error in each of the two 
scenarios. Also, for < r < 1 /2, let 



Mr) 



liminf log 2 Mfa LnrJ) 



1,2, 



be the coding rate achievable in each situation when t grows 
proportionally with n, with constant of proportionality t. 

For the analysis to follow, we need to recall the definition 
of £ n t from Section IHI-C1 and the fact © that \£ n t \ = 

E-= (T)- 

A. Scenario I 

Here, we assume that the locations of the grains are known 
to the decoder of the data but are not available at the time of 
writing on the medium. A code C is said to correct t grains 



known to the receiver if <f>(xi) ^ 4>( x 2) f° r an Y lwo distinct 
vectors xi,X2 £ C and any <f> € 3> n ,t- 

An obvious solution for the decoder is to consider as 
erasures the positions that could be in error, so the encoder can 
rely on a t-erasure-correcting code. Therefore, by the argument 



of the Gilbert- Varshamov bound, Mi(n, t) > 



£l=o l 



and 



hence, i? 1 (r) > 1 — h(r). However, this lower bound can be 
improved, as our next proposition shows. 

Proposition 8 We have 

on 

Mi(n,i) > 



i=0\i) 

Hence, R^t) > 1 - (1 -r)^^) forr < \ - & « 0.2764. 

Proof : We shall construct a code C of size at least 2 n /|£„ it | 
by a greedy procedure. We begin with an empty set, choose 
an arbitrary vector xi and include it in C. Having picked 
jei, . . . , Xi-i, for some i > 1, we choose Xi so that 

i-l 

x ii [J{ x j © e : e e £n.t}- 

i=i 

We stop when such a choice is not possible. At that point, we 
will have constructed a code C that satisfies \C\ ■ \£ n t\ ^ 2™. 

We claim that C corrects t grains known to the receiver. 
Suppose not; then there exists a grain pattern 4> € § n ,t 
such that 4>(xi) = <j>(xj) for some Xi,Xj G C, i > j. 
Equivalently, © e' for some error vectors e, e' 

with supp(e), supp(e') C supp(e0), where supp(-) denotes the 
support of a vector. We then have xi = Xj © (e © e') with 
e © e' € £ n ,t, which contradicts the construction of C. 

As in the proof of Proposition [7] the bound on i? 1 (r) 
follows from the observation that when t/n < \ — the 
asymptotics of the summation J2l=o ("7*) ls determined by 
the term (""*). I 

B. Scenario II 



This scenario is similar in spirit to the channel with 
localized errors of Bassalygo et al. |fl~). In ma t setting, both 
the transmitter and the receiver know that all but t positions 
of the codevector will remain error-free, and the coordinates 
of the t positions which can (but need not) be in error are 
known to the transmitter but not the receiver. Thus, in our 
Scenario II, the encoder may rely on codes that correct 
localized errors, which according to U gives the bound 
Bair) — 1 — h(r). Again, this bound can be improved. 
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Proposition 9 We have 

M 2 (n,t) > 



2" 



2ii 



Hence, R 2 (t) > 1 - (1 - T )h(j^) forr < §-^f « 0.2764. 

Proof : We show that when the encoder knows the error 
locations, then it can successfully transmit 

1 2" 



M > 



(10) 



2n \£ ntt \ 

messages to the decoder, which proves the claimed lower 
bound on M2(n,t). We follow the proof of Theorem 3 of 

m. 

Given a message i E {1, ...,M} to be transmitted, the 
transmitter will use knowledge of the grain pattern <f> (with 
e <t> G £n,t) to encode i using a suitably chosen vector from a 
set of binary vectors X 1 = {xj : j = 1, . . . , n}. A vector 
is said to be gooc/ for e e £ n t if for any i ^ i' and for any 
j' we have, 

d H {x) ffi e, sc}) < d H {x) ffi e, a;}',), 

where •) denotes Hamming distance. The family of sets 
X 1 , i = l,..., M, is good if for any i <E {1, . . . , M} and for 
any e € £„.t, there exists a vector x* € A" 1 that is good for e. 
A good family of sets X 1 , i = 1, . . . , M, enables the encoder 
to transmit any message in {1, . . . , M} with perfect recovery 
by the decoder. Indeed, given the grain pattern <fi, the encoder 
chooses for transmission of message i a vector in X % that is 
good for e^. 

Thus, we only need to show that for M satisfying (TlOb 
there exists a good family of sets X % = {x* : j = 1, . . . , n}, 
i = 1,...,M. There are 2 nHl families of M sets X\ each 
containing at most n binary vectors of length n. Of these, the 
number of families that are not good does not exceed 

M.|fi Bit |.((Af-l)n|£ Bl t|) n -2'* a ( Jltf - 1 ). 

If M satisfies ( [Tol l with equality, then this number is less than 
2" . Therefore, there exists a good family of sets X % . 

The argument for the lower bound on R 2 (r) is the same as 
that given for ^ (r) in the proof of Proposition [8] since the 
extra multiplicative factor of i does not affect the asymptotic 
behavior. I 

To summarize, we obtain a lower bound on R^t), i = 1 
or 2, of the form 

R^t) > max jo.5, 1 - (1 - r)h(j^— )}. 

This is because the rate- 1 /^ code 7l n defined in (Q]) is still 
viable in the context of Scenarios I and II. A straightforward 
upper bound (r) < 1 — r follows from the fact that Mi (n, t) 
and Al2(n,t) cannot exceed 2 Tl ~', which is simply the one- 
bit-per-grain upper bound. 

V. Capacity of the Grains Channel 

Thus far in this paper, we have considered a combinatorial 
model of the one-dimensional granular medium, and given 
various bounds on the rate of t-grain-correcting codes. We 



will now switch to a parallel track by defining a natural 
probabilistic model of a channel corresponding to the one- 
dimensional granular medium with grains of length at most 2 
(the "grains channel"). This is a binary-output channel that can 
make an error only at positions where a length-2 grain ends. 
In fact, error events are data-dependent: an error occurs at a 
position where a length-2 grain ends if and only if the channel 
input at that position differs from the previous channel input. 
Our goal is to estimate the Shannon-theoretic capacity for the 
grains channel model. Let us proceed to formal definitions. 

Suppose x = x\x 2 and y = y\yi denote the input 

and output sequence respectively, with Xi, yi € {0, 1} for all i. 

We further define the sequence u = u\u 2 , where ui = 1 

(resp. Ui = 0) indicates that a length-2 grain ends (resp. does 
not end) at position i. We take u to be a first-order Markov 
chain, independent of the channel input x, having transition 
probabilities P(ui\ui-\) as tabulated below (for some p E 
[0,1]): 

Ui = Ui = 1 

(11) 



-i=0 
-i = 1 



1-p 
1 



P 




The grains channel makes an error at position i (i.e., Xi ^ yi) 
if and only if u^ = 1 and x.i ^ Xi-\. To be precise, 



yi — Xi © {Xi ffi Xi—\)Ui, 



(12) 



where the operations are being performed modulo 2. Equiva- 
lently, 

I Xi if Ui = 
if Ui = 1. 



Vi 



(13) 



We will find it useful to define the error sequence z = 

z\,zi, z 3 , . . ., where Zi = Xi © yi. Thus, 



Zi — Ui(xi ffi Xi— i). 



(14) 



The case i = 1 is not covered by the above definitions. We 
will include it once we define a finite-state model of the grains 
channel. 

The grains channel as we have defined above is a special 
case of a somewhat more general "write channel" model 
considered in (4). 

A. Discrete Finite-State Channels 

For easy reference, we record here some important facts 
about discrete finite-state channels. The material in this section 
is substantially based upon J5] Section 4.6]. 

A stationary discrete finite-state channel (DFSC) has an 
input sequence x = x\, Xi, £3, . . ., an output sequence y = 

2/2, 2/3, • ■ and a state sequence s = si, S2, S3, Each 

Xji is a symbol from a finite input alphabet X, each y n is 
a symbol from a finite output alphabet 3^, and each state s n 
takes values in a finite set of states S. The channel is described 
statistically by specifying a conditional probability assignment 
P(Vn, s n \xn, s n -i), which is independent of n. It is assumed 
that, conditional on x n and s n _i, the pair y n , s n is statistically 
independent of all inputs xj, j < n, outputs yj, j < n, and 
states Sj, j < n—1. To complete the description of the channel, 
an initial state sq, also taking values in S, must be specified. 
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For a DFSC, we define the lower (or pessimistic) capacity 
C = lirrin^ooC^, and upper (or optimistic) capacity C = 
linx^oo C„, where 

C„ = n~ max min/(x n ;y™ | so) 

Q n (x n ) so&S 

C n = n^ 1 max max/(a;";y n | sq)- 

Q n (x n ) s a es 

In the above expressions, I(x n ;y n \ so) is the mutual 
information between the length-n input x n = (xi, . . ■ ,x n ) 
and the length-n output y n = [y\, . . . , y n ), given the value of 
the initial state sq , and the maximum is taken over probability 
distributions Q n (x n ) on the input x n . The limits in the above 
definitions of C_ and C are known to exist. Clearly, C_ n < C n 
for all n, and thus, C_< C. The capacities C_ and C have an 
operational meaning in the usual Shannon-theoretic sense — 
see Theorems 4.6.2 and 5.9.2 in (3J- 

The upper and lower capacities coincide for a large class 
of channels known as indecomposable channels. Roughly, an 
indecomposable DFSC is a DFSC in which the effect of the 
initial state sq dies away with time. Formally, let q(s n \ x n , sq) 
denote the conditional probability that the nth state is s„, 
given the input sequence x n = (xi, . . . , x„) and initial state 
so- Evidently, q(s n \ x n ,so) is computable from the channel 
statistics. A DFSC is indecomposable if, for any e > 0, there 
exists an no such that for all n > no, we have 

\q(s n | x n ,s ) -q{s n | x n ,s )| < e 

for all s„, x n , so and s' . Theorem 4.6.3 of J3) gives an easy- 
to-check necessary and sufficient condition for a DFSC to be 
indecomposable: for some fixed n and each x n , there exists a 
choice for s„ (which may depend on x n ) such that 

ming(s„ | x n , s ) > 0. (15) 

so 

We note here that the channels we consider in the subsequent 
sections are indecomposable except in very special cases. For 
these special cases, it can still be shown that C_ = C holds. 

We make a few comments about DFSCs for which C_ = C 
holds. We denote by C the common value of C_ and C. This C, 
which we refer to simply as the capacity of the DFSC, can be 
expressed alternatively. If we assign a probability distribution 
to the initial state, so that sq becomes a random variable, then 
C = lim rwoo C n , where 

C n = - max I(x n ;y n | s ). (16) 

Clearly, C_ n < C n < C n for all n, so that C, as defined above, 
is indeed the common value of C_ and C. Note that this is 
independent of the choice of the probability distribution on 

Sq. 

A further simplification to the expression for capacity is 
possible. Since \I(x n ;y n ) — I(x n ;y n \ so)| < log 2 \S\ (see, 
for example, J3] Appendix 4A, Lemma 1]), we in fact have 

C= lim - max I(x n ;y n ). (17) 

n->oo n Qn(x") 

The capacity of a DFSC is difficult to compute in general. 
A useful lower bound that is sometimes easier to compute (or 



at least estimate) is the so-called symmetric information rate 
(SIR) of the DFSC: 

R= lim -I(x n -y n ), (18) 

n— >oo n 

where the input sequence x is an i.i.d. Bernoulli( 1 /2) random 
sequence. 

B. First results 

It is easy to see that the grains channel is a DFSC, where 
the nth state s n is the pair (u n ,x n ), which takes values in 
the finite set S = {(0, 0), (0, 1), (1, 0), (1, 1)}. Again, for 
completeness, we assume an initial state so that takes values 
in SR 

Proposition 10 The grains channel is indecomposable for p < 
1. 

Proof : We must check that the condition in (fTBI l holds. We 
take n = 1 and si = (0,xi). Then, min So q{s\ \ xi,sq) = 
min je{o,i} P( u i = | u = j) = 1 - p > 0. I 

As a consequence of the above proposition, the equality 
C_ = C holds for the grains channel when p < 1. In fact, this 
equality also holds for the grains channel when p = 1, as the 
following result shows. 

Proposition 11 For the grains channel with p = 1, we have 

C = C = 1/2. 

Proof : We have, with probability 1, 

u = ui,u 2 ,u 3 ,U4,u 5 ,Ue, ■ ■ ■ 

j 0,1, 0,1, 0,1,... ifw = l 
1 1,0,1,0,1,0, ... ifw o = 0. 

Thus, once the initial state sq = (uq,xo) is fixed, the output 
y of the grains channel is a deterministic function of the input 
x: 

2/ = 2/1,2/2,2/3,2/4,2/5,2/6, - - - 

\Xi,Xi,X 3 ,X3,X 5 ,X 5 , ... if s = (l,x ) 
\x ,X2 7 X 2 ,X4,X4 7 X( i , ... if s = (0,x ). 

Therefore, for any fixed s € S, we have H(y" \ x n ,So = 
s) = 0, and hence, I(x n ;y n \ sq = s) = H(y" \ sq = s). If 
x n is a sequence of i.i.d. BernoulhX 1 ^) random variables, then 
ram s€S H(y n | s = s) = H(y n | s = (0,x )) - [n/2\. 
It follows that C n > so that C > 1/2. On the other 

hand, for any input distribution Q n (x n ), and any s £ S, we 
have H{y n \ s = s) < fn/2]. Consequently, C n < -t^l, 
and hence, C < 1/2. We conclude that C = C = 1/2. I™ 

In view of the two propositions above, the capacity of the 
grains channel is defined by (IT7T) . From here onward, we 

1 To be strictly faithful to the granular medium we are modeling, we should 
restrict so to take values only in {(1, 0), (1, 1)}, so that uq = 1. This would 
imply «i = 0, meaning that no length-2 grain ends at the first bit cell of the 
medium, corresponding to physical reality. But this makes no difference to 
the asymptotics of the channel, and in particular, to the channel capacity. 
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denote this capacity by C s , and use the notation C s (p) when 
the dependence on p needs to be emphasized. It is difficult to 
compute the capacity C s exactly, so we will provide useful 
upper and lower bounds. We note here for future reference the 
trivial bound obtained from Proposition [TT] 



C s (p) > C g (l) = y 2 . 



(19) 



C. Upper Bound: BINAEras 

Consider a binary-input channel similar to the binary era- 
sure channel, except that erasures in consecutive positions 
are not allowed. Formally, this is a channel with a binary 
input sequence x = x%, X2, X3, ■ ■ •, with x\ £ {0,1} for all 
i, and a ternary output sequence y = y x , y 2 , 2/3, . . ., with 
yi € {0,1, e} for all i, where e is an erasure symbol. The 
input-output relationship is determined by a binary sequence 
u = iti, U2, 1*3, • • ■, which is a first-order Markov chain, inde- 
pendent of the input sequence x, with transition probabilities 
P(v,i\ui-i) as in (Hit . We then have 



Xi if lij = 
e if Ui = 1 



(20) 



Since P(ui = 1 | Ui-\ = 1) = 0, adjacent erasures do not 
occur, so we term this channel the binary-input no-adjacent- 
erasures (BINAEras) channel. To describe the channel com- 
pletely, we define an initial state zq taking values in {0, e}. 

The BINAEras channel is a DFSC for which C = C holds, 
and its capacity, which we denote by C e (p), can be computed 
explicitly. 

Theorem 12 For the BINAEras channel with parameter p E 
[0 1 l],wehaveC = C = C e (p) 4 -L-. 

Intuitively, the average erasure probability of a symbol equals 
p = yq^, and the capacity C e (p) equals 1— p. A formal proof 
is given in Appendix A. 

We claim that the grains channel is a stochastically degraded 
BINAEras channel. Indeed, the grains channel is obtained by 
cascading the BINAEras channel with a ternary-input channel 
defined as follows: the input sequence y = yx, y 2 , U3, ■ • •> 
yi G {0,1, e}, is transformed to the output sequence y' = 
y'x , 2/2 > 2/3 > • • • according to the rule 



Vi 




(21) 



To cover the case when y\ = e, we set y[ equal to some 
arbitrary yo £ {0, 1}. It is straightforward to verify, via ( f20l >. 
(|2TT > and the fact that = 1 | m-i = 1) = 0, that 

the cascade of the BINAEras channel with the above channel 
has an input-output mapping x- t n- y[ given by the equation 
obtained by replacing y t with y[ in (1131 . This immediately 
leads to the following theorem. 

Theorem 13 Forp g [0, 1], wehaveC s (p) < C e (p) = T ^-. 

Remark: We remark that any code that corrects t nonadjacent 
substitution errors (bit flips) also corrects t grain errors. It is 



therefore tempting to bound the capacity of the grains channel 
by the capacity of the binary channel with nonadjacent errors. 
Such a channel is defined similarly to the BINAEras channel: 
the channel noise is controlled by a first-order Markov channel 
u ( fTTT i, and yi = Xi® Ui for all i > 1. The capacity of this 
channel is computed as in the BINAEras case and equals 1 — 
h(p) / (1 + p) , where h(p) denotes the binary entropy function. 
However, a closer examination convinces one that this quantity 
does not provide a valid lower bound for C s (p). 

D. Lower Bound: The Symmetric Information Rate 

In this section, we derive an exact expression for the SIR of 
the grains channel, which gives a lower bound on the capacity 
of the channel. In accordance with the definition of SIR <JT8j, 
assume that x is an i.i.d. BernoulhX 1 ^) random sequence. With 
this assumption, the state sequence s is a first-order Markov 
chain. Also, each output symbol y n is easily verified to be a 
Bernoulli( 1 /2) random variable (but y n is not independent of 

y n -i)- 

We also assume that the initial state sq is a random 
variable distributed according to the stationary distribution 
of the Markov chain, so that the sequence s is a station- 
ary Markov chain. It follows that the output sequence y 
is a stationary random sequence, so that the entropy rate 
H(Y) := lim n _ i . 00 — H(y n ) exists. It is also worth noting here 
that the initial distribution assumed on so causes the Markov 
chain u to be stationary as well. In particular, the random 
variables Ui, i > 0, all have the stationary distribution given 
by P( Ui = 0) = ^ and P( Ui = 1) 

We have 



i+P' 



R s = lim -I(x n ;y n ) 

n—¥oo Yl 



(22) 



I(x n ;y n ) = H(y n ) - H(y n \x n ) = H(y n ) - H(z n \x n ) 

(23) 

As noted above, H(Y) = lim n _ i . 00 — H(y n ) exists. In fact, 
we can give an exact expression for H{Y) in terms of an 
infinite series. 

Proposition 14 The entropy rate of the output process of the 
grains channel is given by 

1 00 i^ 1 

ff ( y )=2fT+^E^i)II( 1 -A). 



J=2 



k=2 



where 



fa := Pr[y j+1 = 1 | y 3 = y } -i = ■ ■ ■ = y 2 = 0, y x = 1] 
is given by the following recursion: (3% = |(1 — p), and for 

1 (1 -(l+p)^ 



i>3, 



&3 = 



(24) 



2 V 1 - Pj-i 
The lengthy proof of this proposition is given in Appendix B. 

Remark: The following explicit expression for /3j,j > 2 
can be proved by induction from (124) : 



0j 



2(0?_y - (# + Y) 



(3 + B + p)(0-) 3 - (3 - B + p)(-d+Y 



(25) 
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where <&± = 1 - ^ and B = ^p 2 + 6p + 1. 

Our next result shows that linin^oo H(z n \ x n ) also exists, 
and gives an exact expression for it, again in terms of an 
infinite series. Appendix B contains a proof of this result. 

Proposition 15 When x is an i.i.d. uniform Bernoulli se- 
quence, we have 



lim H(z n | x n ) 



1+P/2 
1+p 



i - (- P y 



3=2 



Together, d22b , d23l ), and Propositions [T4l and [151 provide an 
exact expression for the SIR of the grains channel. This, along 
with the trivial bound $1% , yields the following lower bound 
on the capacity C s . 

Theorem 16 The capacity C g (p) > max(Y2, R g (p)), where 
R s (p) is the SIR of the grains channel and is given by the 
following expression: 



R s (p) 



with j3j as in UM or 



3-1 



k=2 



2 + p h fi-(- P y 



2-i 



1+p 



In Figure [2] we plot the upper and lower bounds on C s (p) 
stated in Theorems Qj] and [16] as well as the value of R 9 (p) 
from Theorem[T6] Observe that the SIR is a strict lower bound 
on the capacity, at least for 0.56 < p < 1, when R g (p) < l /2. 

The plots are obtained by numerically evaluating R s (p) by 
truncating its infinite series at some large value of j. We give 
here a somewhat crude, but useful, estimate of the error in 
truncating this series at some index j = J, with J > 2. Define 
the partial sums 



S.j = 



T, = 



l+p/2 



1 



J 

E 

i=2 



2- 3 h 



1 - {-p) j 

1 + p 



(26) 



1 



j'-i 



2(1 +P) 



k=2 



and note that the Jth partial sum of the R s (p) series is 
precisely Tj — Sj. 

Proposition 17 The error \R 9 (p) — (Tj - Sj)\ in truncating 
the R s (p) series at an index j = J, with J > 2, is at most 

1 



1 



P 



(l+p/2) 2- J + 2-L(^+D/2J 



In particular, for any p € [0, 1], the truncation error is at most 

2 -J + 2 -L(J+i)/2J, 

We defer the proof to Appendix B. 

The plot of R 9 (p) in Figure [2{a) was generated using J = 
15 terms of the infinite series, so the plotted curve is within 
0.004 of the true R B curve for all p. 




(a) Bounds on C s (p). The gray area shows the gap between the 
lower bound of Theorem 1161 and the upper bound of Theorem [T5] 
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(b) The symmetric information rate ij g (p). 



Fig. 2. Plots of the upper and lower bounds on the capacity of the grains 
channel C s (p), and the SIR of the grains channel _R g (p0, as functions of p. 



E. Zero-Error Capacity 

We end with a few remarks on the zero-error capacity of the 
grains channel. We are interested in the maximum zero-error 
information rate, Ro(n), achievable over the grains channel 
with parameter p E [0,1] and input x n . The case when p = 
is trivial (the channel introduces no errors), so we consider 
p > 0. 

The zero-error analysis depends on the initial state sq of the 
channel. Suppose that sq is such that Pr[ui = 1] > 0. Then, 
the state sequence u n = 1, 0, 1, 0, . . . , (n mod 2) is realized 
with some positive probability. Corresponding to this state 
sequence, we have y n = xq, x<i, x^, X4,, . . . , x 2 i„/2j ■ Thus, 
at most [n/2\ bits can be transmitted without error across 



this realization of the channel. Hence, Ro(n) 



< 



\n/2\. 



This zero-error information rate can actually be achieved. 
Consider the binary length-n code 7Z n defined in (fTJ which 
has 2L™/ 2 J codewords. When a codeword from lZ n is sent 
across any realization of the grains channel, the bits at even 
coordinates remain unchanged. Thus, [n/2\ bits of infor- 
mation can be transmitted without error, which proves that 
R (n) = i [n/2\. 

On the other hand, suppose that the initial state sq is such 
that Pr[ui = 1] = 0. Then, the worst-case channel realization 
is caused by the state sequence u n = 0, 1, 0, 1, . . . , (1 + 11 
mod 2). In this case, the channel is such that the first coor- 



II 



dinate of the input sequence is always received without error 
at the output. A slight modification of the preceding argument 
now shows that Ro(n) = — \n/2]. 

We have thus proved the following result. 

Proposition 18 Consider a grains channel with parameter p > 
0. If the initial state so is such that Pt[ui = 1] > 0, then 
Ro(n) = i [n/2\ ; otherwise, R (n) = ± fn/2] . 

In any case, the zero-error capacity of the channel is Cq = 

lirn^oo R (n) = 1 /2. 

Appendix A: Proof of Theorem [T2l 

Observe first that the BINAEras channel is indecomposable 
for p < 1. Indeed, for this channel, the condition in ( TT3T > 
reduces to showing that for some fixed n, there exists a 
choice for u n such that min uo P(u n \u ) > 0. This condition 
clearly holds for n = 1 and u\ = 0: min_j e { .i} P( u i = 
uq = j) = 1 — p > 0, provided p < 1. We deal with the 
indecomposable case in this appendix; when p = 1, the proof 
for C_ = C = 1/2 follows, mutatis mutandis, the proof of 
Proposition [TT] 

When the channel is indecomposable, we have C = C = C. 
We will show that C = j^. Choose the distribution on uo 
to be the stationary distribution of the Markov process u, so 
that P(uq = 0) = and P(uo = 1) = j2— . Consequently, 
u is a stationary process, and in particular, for all i > 1, we 
have P{ Ul = 0) = and P( Ul = 1) = 

Observe that 

I(x n ;y n I «o) = H(y n | u ) - H(y n \ x n ,u ) 

( = } H(y n \u Q )-H(u n \x n : u ) 

® H(y n I u ) - H{u n I uo), 

with equality (a) above due to the fact that, given x n , the 
sequences y" and u n uniquely determine each other, and 
equality (b) because u n is independent of x n . Furthermore, 
since it is a stationary first-order Markov process, we have 

H(u n I Uq) = YZ=1 H ( U ™ I = 1lH ( u l I u o) = n J^- 

Hence, 

C n = n- 1 max H(y n \ u ) - (28) 

Q»(x») l+p 

Now, H(y n I u ) = ELi H (y> I V^\^)- Since y*" 1 
completely determines u 1 ^ 1 , we have by the data processing 
inequality [2, Theorem 2.8.1], 

H( yi I y'-^uo) < H{ Vl I m 1 -\u ) 

We further have 

H{ Vl I u l ~\u Q ) <H(y t I Ui_!) 

= I = 0)-^- + ^(j/j I = 1)- 1 



u<-i = 0) = (1 - p)/2. This yields H{ Vl \ Ui -i = 1) < 
+ 1 — p. Putting all the inequalities together, we find that 

n 

H(y n \uo)^Y, H ^\y % ~^ u o) 

i=l 

<n( T ^ + (^) + l-p) T ^) 
l + h(p)\ 



l+P . • * / 1+J) 

Given = 1, y.^ is a binary random variable (since ui = 
with probability 1), and thus, H{yi \ u^i = 1) < 1. On the 
other hand, we have P(yi = e | iti_i = 0) = P(ui = 1 | 
= 0) = p, and so the conditional entropy H(yi \ itj_i = 
0) is maximized when P(yi = | u^i = 0) = P(yi = 1 



l+p ) 

It is not difficult to check that the above in fact holds with 
equality when the input sequence x n is an i.i.d. sequence of 
Bernoulli( 1 /2) random variables. Thus, 

n- 1 max H(y n I u ) = 1 + h ( p \ 

Q»(x») Vy ' °> l+p 

Plugging this into ( 1281 ). we obtain that C n = for all n, 

and hence, C = tt— ■ 

' i+p 

Appendix B: Proofs of Propositions [T41[T?1 and[T71 

B.l. Proof of Proposition [74] 

Since lim^oo \H(y n ) = lim^oo H ~(y t+ i | y l ), we need 
show that the latter limit equals the expression in the statement 
of the proposition. We will work with the identity 

H{y l+1 \y l )= J2 H{y l+1 \y l = b)Yr[y l = b]. 

From the channel input-output relationship given by $13[ and 
the fact that the input x is an i.i.d. BernoulhX 1 ^) sequence, it 
is clear that Pr[y 4 = b] = Pr[y l = b], where b = b + l n is 
the sequence obtained by flipping each bit in b. It then also 
follows that H(yi + i \ y % = b) = H{yi + \ \ y % = b), since 
Pr[y i+1 = 1 I y l = b] = Pr[y. t+1 = | y' ; - b}. Hence, 

H(y i+1 I y l ) = 2 ^ H(y l+1 | y* = b) Pr[y ! = 6], (29) 

beB 

where B = {(&,-,..., 61) € {0,1}* : h = 0} is the set of 
all binary length-i sequences that have a in the leftmost 
coordinate. 

Fix i > 2. Define, for 2 < j < i, the events 

b 3 = {y l ■ ivi,yi-i,---,yi-j+i) = o J_1 i}, 

which, together with the event {y l = 1 }, form a partition of 
B. Here, O^l is shorthand for the j-tuple (0, . . . , 0, 1). We 
record two facts about Bj. First, 

Pr[y* G Bj] = Pr[(y i)W _i, . . . , Vi - j+ i) = CP" 1 !] 

= Pr[(y J , % _ 1 ,..., 2/1 ) = 0^ 1 l], (30) 

the last equality stemming from the fact that y is stationary. 
Second, by the following lemma, 

H(y i+1 \ y l = b) = h(Fv[ yi+1 = l\y i = b]) (31) 

is invariant over Bj. 

Lemma 19 For 6 G Bj, Pr[y i+ i = 1 | y % = b] equals 

1/2 PT[ Uj = I { yj -i,yj-2, ■ • ■ ,2/2) = J - 2 , (ui.xi) = (0,0)] 
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Proof : The proof relies upon the following claim : 
Suppose that 2/fc-i = b; then, with probability 1, we 
have y k = b if and only if s k := (u k , x k ) = (0, b). 
Indeed, even without the assumption on y k -i, the "if" part 
holds trivially. For the "only if" part, assume that y k -i = b 
and y k = b. Note that if u k = 1, then with probability 1, we 
have u k -\ = 0. Hence, by way of (Qj), we have y k = x k -\ = 
y k -i- However, y k -i ^ y k by assumption; so we must have 
u k = 0. Consequently, y k = x k , so that x k = b. 
Consider any foe Bj. From the claim, we have 

Pr[y l+1 = l\y l = b] = PrKm+i.Xi+i) = (0, l)\y l = b] 
= i/2Pr[u i+ i = 0|^ = b], 

where we have used the fact that Xj+i is independent of y 1 . 
Note that, in the event y l = b, we have 2/1-J+2 = and 
= 1, so that by the claim again, 



Pr[u i+1 = 0\y l = b}= Pi[u i+1 = 0\y l = fo, 



-2 



(0,0)] 



Now, given the channel state Si-j+ 2 = (0,0), the ran- 
dom variables itj+i, yu 2/i-i, ■ ■ ■ , 2/i-j+2 are conditionally 
independent of the past output Furthermore, given 

Si-j+2 = (0, 0), the random variable j/i-j+2 is uniquely 
determined: 2/i-j+2 = 0. Hence, 

Pi[u i+1 = | y 1 = 6, Sj„ i+2 = 0] = 

Pr[u i+1 = | (j/i, . . . , yi-j+3) = 0- 7-2 , s 4 „ j+2 = 0]. 

Finally, by the joint stationarity of y and u, the right-hand 
side above is equal to 

Px[uj = I (y j - 1 ,y j - 2 ,...,y2) = 0^ 2 , si = 0], 

which is what we needed to show. I 

In the statement of Proposition [14] we defined /3j = 

Pr[i/j+i = 1 I (yj,yj-i,...,yi) = 3 '" 1 !]. Note that if we 
set i = j in Lemma [19] we get 

ft = y 2 Pr[ Uj = I (%_!,. ..,«!,) = 0^ 2 , (tn.an) = (0,0)]. 

(32) 

From <[29l-(l32l>, and Lemma [T9l we have 

i 

H(y l+1 I y 1 ) = 2 £>(&) Pr[(l&, • • • ,2/0 = 0^1] 

+ 2tf(j/ 4+1 I ^ = 0^4^ = 0*].(33) 

The term at the end of the above expression vanishes as 
i — > 00, as we show below for completeness. 



Lemma 20 lim H(y i+1 \ y l = 8 ) Pr[y* = 0' 



0. 



Proof : Since < H(yi + i \ y 1 = l ) < 1, it is enough to 

show that Pr[y l = 0'] = converges to 0. For this, observe 

that for any j, if yj = 0, then (xj-i,Xj) 7^ (1, 1). Hence, if 

y l = 0% then (xi,x 2 ) 7^ (1,1), 7^ (1,1), an d so on - 

Thus, Pr[y< = ? ] < (3/4)L l /2J ; which suffices to prove me 

lemma. I 

So, letting i — > 00 in fl33l l, we obtain 



J'=2 



, Vl ) = W~ 1 1]. (34) 



The proof of Proposition[l4]will be complete once we prove 
the next two lemmas. 



Lemma 21 For j > 2, we have 

1 J_1 

Pv[( yj , yj . u . . . ,2/i) = O^ 1 !] = ^—^ J](l - AO 



Proof : From the definition of /3j , we readily obtain 

Pr[( % ,..., 2/1 )=0^ 1 l] = 

nc 1 -^) 



,fc=2 



Pr[(2/ 2 ,2/i) = (0,l)]. 



We must show that Pr[(y 2 , 2/1) = (0, 1)] = 
We write 

Pr[(2/ 2 ,2/i) = (0,1)] = 

2 Pr[(2/ 2 ,2/i) = (0,l) I (u 2 ,ui) = (o,6)] 

(a,6)G{0,l} 2 

x Pr[(u 2 ,ui) = (a,b)]. 

Clearly, Pr[(u 2 , u x ) = (1, 1)] = 0. Also, Pr[(2/ 2 , 2/1) - (0, 1) | 
(ii 2 ,ui) = (1,0)] = 0, since, given (ii 2 ,ui) = (1,0) we 
must have y 2 = x\ = y\, by virtue of d 1 3t . Next, given 
(m,ui) = (0,0), we have (2/2,2/1) = (x 2 ,xi), and since 
(x2,x%) is independent of (it 2 ,ui), we find that 

Pr[(|fe > l/i) = (0 > l)|(« a ,u 1 ) = (0,0)] 

= Pr[(.T 2 ,X!) = (0,1)] = 1/4- 

By a similar argument, Pr[(y 2 ,2/i) = (0,1) | (w 2 , u i) = 
(0, 1)] = 1/4- Hence, 

1 



Pr[(2/ 2 ,2/i) = (0,l)] = (yO Pr[u 2 =0] = 



4(1+ p)' 



as desired. 



Lemma 22 /3 2 = |(1 — p), and for j > 3, /3j satisfies the 



recursion m 

Proof : From (I32t , we have 

= y 2 Pr[«a = 0|(ui > x 1 ) = (0 I 0)] 

= yaPr[ua = 0|ui = 0] = ya(l-p). 

For convenience, define, for j > 2, Ej = 
{(yj-l,yj-2, ■ • ■ ,2/2) = J ~ 2 , (iii,x-i) = (0,0)}, so 
that Pj = Q-/2)Pi[uj = I Ej] = (1/2) (1 - 7j), where 
7j := Pr[itj = I Ej]. We shall show that for j > 3, 

P(l-7i-i) 



11 1+7,-1 
which is equivalent to the recursion in ( T24l . 



(35) 
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So, let j > 3 be fixed. We start with 

7, = Yl Pr[% = 1 | %-i = b] Pr[ % _i = b | Ej] 
6e{o,i} 
= p-Pi[u J - 1 =0\E j ] 
= p ■ Pr[ Ui _i = | = 0, 

Prfe-i = | uj_i = 0, gg_i](l - 7j -i) 
P ' ' Pr[ % _! = | Ej-t] 

where we have used Pr[u,_i = | Ej_{\ = 1 — 7j_i for the 
last equality. 

Given = 0, we have yj-i = Xj-i, and since Xj-x 

is independent of Uj-i and -Ey-i, the numerator in the last 
expression above evaluates to 1 /2(1 — 7j-i). Thus, 

ya(l-7i-i) 



Hence, 



7j = P ■ 



(36) 



Pr[ % _x = | Ej_x] 
Turning to the denominator, we write Pr[y,_i = | -Ej-i] as 

^ Pr[ % _! = | Uj -_i = 6, Pr^-i = & | Ej-x] 
6e{o,i} 

= 1/2(1 - 7i-i) + Prfe/i-i = I u s -x = hEj.x] ■ 7,-i 

(37) 

We claim that Pr[y,_i = | uj-i = l,Ej-\] = 1. Indeed, 
given u 3 -_i = 1, we have yj-i = Xj-2- Furthermore, we must 
have Uj-2 = with probability 1, so that Xj-2 = Uj-2- Thus, 
given Uj-i = 1, we must have Uj-i = Uj-2 with probability 
1. But note that the event implies Uj-2 = 0: if j = 3, this 
follows from (ui,xi) = (0, 0), and if j > 4, this is contained 
within (j/j-2, • • • ,2/2) = CP -3 . Thus, given %-i = 1 and 
Ej-i, we have yj-i = Uj-2 = with probability 1. 
So, carrying on from d37l , we get 

Pr[ yi _x = I Ej-t] = 1/2(1 - 7i-i) + 7i-i = Va(l + 7i-i) 
Feeding this back into ( 1361 ). we obtain 

1/2(1-7,-1) 



7j P ' 1/2(1 + 7,-1) 



which is the desired recursion 

This concludes the proof of Proposition [14] 

B.2. Proof of Proposition [75] 

We break the proof into two parts. We first show that 

1 00 
lim - H(z n | x n ) = V 2~ j H( Uj \ ui) (38) 



3=2 



and subsequently, we prove that 

i=2 ^ J=2 

To show d38l . we start with 



2~ J h 



1 - (~P) J 
l+p 



(39) 



H(z n \x n )=J2H(z i \z 1 ,...,z i - 1 ,x n ). 

?:=i 

From ([T4l , it is evident that Zi is independent of Xj for j > 



ff(z n j I Z 1 ,...,Z i - l ,X i ). 



i=l 

As a result, by the Cesaro mean theorem, 

lim — H(z n I x n ) = lim if | z-y, . . . , Zi-x,x % ), 

provided the latter limit exists. 

To evaluate H(zi \ Zx, • • • , 2»— 1, a? 1 ), we define the events 

A = {x l : Xi = 

Aj = {x l : x l ^ Xi-i =■■■ = Xi-j ^ Xi-j-x}, 1 < j < i—2, 

and Ai-x = {x l : Xi 7^ x,_x = ••• = xx}. These events 
partition the space {0, l} 1 to which x % belongs. Since x is 
an i.i.d. uniform Bernoulli sequence, we have Pr[cc l 6 Aj] = 
(i/a)^ 1 for < j < i - 2, and Pr[a; 1 g = (1/2)' 1 " 1 . 

Now, if a;' g ^4o> then by (fl4b . we have Zi = 0. 
Consequently, H(zt \ 21, ... , Zi-i, a;' 1 g Ao) = 0. 

If x l g A.,- for some j g [1, i — 2], then we have = Uj, 
2j_i = • ■ ■ = Zi—j+i = 0, and Zi—j = Ui—j. Thus, 

H(zi I zx,.. .,z^i,x l g A,) 

= if(uj I zi, . . .jZi-j-ijtii-j^sc' g Aj) 

= H(ui I Ui-j) = H(u J+1 I ui). 

Equality (a) above is due to the fact that u is a first- 
order Markov chain independent of x, while equality (b) 
is a consequence of the stationarity of u (which is itself a 
consequence of the stationarity of the state sequence s). 

Finally, if x l g Ai-x, then Zi = u, and Zi-x = ■ ■ ■ = Z2 = 

0. Thus, 

H(zi \zx,..., z i - 1 ,x l g Ai^x) = H(m I zx). 

Therefore, 

7f(zi I 21, . . . ,Zi-X,X l ) 
i-l 

= ^H(zi I zi,...,z i -x,x i g A,)Pr[a; 1 g A,-] 
j'=o 

i-2 

= 5Z ff ( u i+ 1 «i)2" rl +%|2i)2" +1 . 
i=i 

Letting i — > 00, we obtain ( 1381 . 

It remains to prove ( 1391 . For this, note first that H(uj | 
Ul ) = H(uj \ux=0) Pr[ux = 0] + i7(u, | iti = 1) Pr[«! = 
1]. Furthermore, since ux = 1 implies U2 = with probability 

1, we have, for all j > 2, 

7f(w, I ux = 1) = i?(uj I « a = 0) = -ff(w,--i I iti = 0), 

the last equality following from the stationarity of u. Hence, 

00 00 
Y J 1- j H{u j \ux = l) = J22-^H(u j - l \ux = 0) 



3=2 



3=2 



-J22- j H( Uj \ux = 0) 



3=2 
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since H(u\ \ u% = 0) = 0. Putting it all together, we find that 

oo 

2 

^ oo 

= (Pr[u a = 0] + - Pr[ui = 1]) ]T 2 ~ J h ( u j \ui=0) 



J'=2 



i=2 



J=2 



Finally, observe that if(uj | Ui = 0) = ( 1 ^ ) , as it 
can be shown (for example, by induction) that Pr(uj = | 
u i =0) = 1 ^+p y for all j > 1. This proves ([39), and with 
this, the proof of Proposition [15] is complete. 



B.3. Proof of Proposition \J7\ 

The error in truncating the R g (p) series at the index j = J 

is 

\rHp)-(Tj-Sj)\ 

< \H(Y) - Tj\ + | lim H(z n \ x n ) - Sj\. (40) 

n— >oo 

It is easy to bound the second term in (1401 1: 

| lim H(z" | x n ) - Sj\ 

= l+p/2 ^ , i_(_ p) J 
1 + p ^ Vl+p 

< f 2- 



3=J4-1 

1+p 



(41) 



Turning our attention to the first term in d40b . we see that 

1 °° 3~ 1 



2(1 +p) 

1 

2(1 +P) 



3=J+1 fe=2 
oo 3—1 



I — J A 

j=J+l fe=2 

Now, from the recursion (F24l i. we readily get for fc > 3, 

1 /l- (1 -j>)/3fc_i' 



1-/3 



fc-i 



and hence, 



(l-/3 fc )(l-/3 fc _ 1 ) = i[l-(l-p)/3 fc _ 1 ] < i. 
Consequently, if j = 2m for some m > 1, then 

3-1 m-l 

n( i - #o = n o- - <w)a - ^ (v^)™- 1 , 

fe=2 fe=l 

and if j = 2m + 1 for some m > 1, then 

3-1 m-l 

i(i - Pk) < n - ^+0(1 - /5 2fe ) < (V3) m_i - 



Upon replacing the bound in d42l by the looser 



1 



j'-i 



2(1 +p) 



E lid -a) 



3=2L(J+l)/2j k=2 

so that the summation starts at an even index j, routine 
algebraic manipulations now yield 



\H(Y)-Tj\ < 



— y 

1 + p ^ 



1 



1 + p 



(l/ 2 )m-l 

2 -L(J+i)/2j_ 



Plugging this and fiD into d40l i, we obtain Proposition \T7\ 
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